Browser Extension Architecture
This document explains the browser extension architecture built with WXT and React. It covers the side panel UI, background script functionality, content script integration, and the messaging system between extension components. It documents the WXT configuration, component hierarchy in the side panel, and how the extension communicates with the backend through WebSocket connections. It also includes examples of lifecycle management, permission handling, cross-origin communication, the agent executor pattern, real-time conversation display, and authentication flow. Finally, it addresses browser compatibility, packaging, and deployment strategies, along with component composition patterns and integration with browser APIs.
The extension is organized into entrypoints for background, content, and side panel, plus shared utilities and hooks. WXT manages build, manifest generation, and browser-specific targets.
Diagram sources
Section sources
Side Panel (React): Hosted inside a shadow root UI, renders the main app, manages authentication, WebSocket connectivity, tab management, and the agent executor.
Background Script: Handles cross-tab messaging, executes agent tools, performs browser-level actions, and manages Gemini requests.
Content Script: Injects UI overlays and performs page-level actions when instructed by the background script.
Messaging System: Uses browser runtime messaging for background ↔ side panel and background ↔ content script communication.
WebSocket Client: Provides a minimal client for real-time agent execution updates and statistics retrieval.
Utilities: Command parsing, agent execution, and browser action execution.
Section sources
The extension follows a layered architecture:
UI Layer: Side panel React app with hooks for auth, WebSocket, and tab management.
Control Layer: Side panel orchestrates agent execution and displays progress.
Communication Layer: WebSocket client for real-time updates; browser messaging for background ↔ side panel and background ↔ content script.
Execution Layer: Background script handles browser APIs and dispatches actions; content script executes DOM-level actions.
Diagram sources
Side Panel React Application#
The side panel is a React app mounted inside a shadow root UI. It initializes the app, sets up authentication, WebSocket connectivity, and tab management. It renders the agent executor and unified settings menu.
Diagram sources
Section sources
Authentication Flow#
The authentication hook integrates with browser identity and a backend service to exchange OAuth codes for tokens, persist user data, and manage token refresh. It supports both Google OAuth and a demo GitHub flow.
Diagram sources
Section sources
WebSocket Integration and Real-Time Updates#
The WebSocket client encapsulates connection management, event emission, and agent execution. The side panel hook subscribes to connection status and progress updates, displaying real-time feedback.
Diagram sources
Section sources
Agent Executor Pattern and Action Execution#
The agent executor parses slash commands, executes agents either via WebSocket or HTTP, and triggers browser actions. It maintains chat sessions and displays formatted responses.
tabs, chat history, attachments"] Prepare --> Mode{"WebSocket connected?"} Mode --> |Yes| WSExec["wsClient.executeAgent()"] Mode --> |No| HTTPExec["fetch() to backend endpoint"] WSExec --> Actions{"Has action plan?"} HTTPExec --> Actions Actions --> |Yes| ExecActions["executeBrowserActions()"] Actions --> |No| Display["Format and display response"] ExecActions --> Display Display --> Persist["Persist sessions to storage"] Persist --> End(["Done"])
Diagram sources
Section sources
Background Script Functionality and Messaging#
The background script listens for messages from the side panel and content script, executes agent tools, manages tabs, and performs browser-level actions. It injects content scripts and coordinates cross-tab communication.
Diagram sources
Section sources
Content Script Integration#
The content script runs on all URLs and can be extended to perform page-level actions. It listens for messages from the background script and executes DOM operations.
Section sources
The extension relies on WXT for build and manifest generation, React for UI, and Socket.IO for real-time communication. Dependencies are declared in package.json and TypeScript configuration extends WXT’s generated tsconfig.
Diagram sources
Section sources
Debounce or throttle frequent UI updates (e.g., tab list refresh) to reduce re-renders.
Batch browser API calls (tabs.query, scripting.executeScript) and cache results where appropriate.
Use lazy loading for heavy components and defer non-critical computations.
Limit DOM extraction sizes (e.g., HTML capture) and apply timeouts to prevent long-running operations.
Prefer polling fallbacks for stats retrieval when WebSocket is unavailable.
Common issues and resolutions:
WebSocket not connecting: Verify VITE_API_URL and server availability; check connection events and fallback to HTTP stats.
Authentication failures: Ensure backend is running and identity API is available; confirm OAuth redirect URI and scopes.
Content script injection errors: Confirm permissions and that the content script path is correct; verify target tab exists.
Tab management inconsistencies: Ensure listeners are registered/unregistered on mount/unmount; use query results before acting.
Cross-origin limitations: Use host_permissions and appropriate permissions; avoid unsafe inline styles in injected UI.
Section sources
The extension architecture cleanly separates concerns across UI, messaging, execution, and communication layers. The React-based side panel provides a modern interface with robust authentication and real-time updates via WebSocket. The background script centralizes browser API interactions and action orchestration, while the content script handles page-level operations. With proper permission handling, lifecycle management, and cross-origin considerations, the extension is ready for production deployment across browsers.
WXT Configuration and Permissions#
Manifest permissions include activeTab, tabs, storage, scripting, identity, sidePanel, webNavigation, webRequest, cookies, bookmarks, history, clipboard, notifications, contextMenus, downloads.
Host permissions grant access to all URLs.
Scripts support development and build targets for Chrome and Firefox.
Section sources
Browser Compatibility and Packaging#
Use WXT scripts to build and package for Chrome and Firefox.
Shadow DOM UI ensures isolation and compatibility across pages.
Feature detection for browser APIs (e.g., speech recognition) prevents runtime errors.
Section sources
Cross-Origin Communication#
Use host_permissions for broad access.
For OAuth, rely on browser.identity and secure redirects.
For backend communication, configure CORS and environment variables for API base URL.
Section sources